124 research outputs found
Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce
A QR factorization of a tall and skinny matrix with n columns can be
represented as a reduction. The operation used along the reduction tree has in
input two n-by-n upper triangular matrices and in output an n-by-n upper
triangular matrix which is defined as the R factor of the two input matrices
stacked the one on top of the other. This operation is binary, associative, and
commutative. We can therefore leverage the MPI library capabilities by using
user-defined MPI operations and MPI_Reduce to perform this reduction. The
resulting code is compact and portable. In this context, the user relies on the
MPI library to select a reduction tree appropriate for the underlying
architecture
The Problem with the Linpack Benchmark Matrix Generator
We characterize the matrix sizes for which the Linpack Benchmark matrix
generator constructs a matrix with identical columns
Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Low-rank Matrix Approximations
Factorizing large matrices by QR with column pivoting (QRCP) is substantially
more expensive than QR without pivoting, owing to communication costs required
for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have
proven themselves empirically to be highly competitive with high-performance
implementations of QR in processing time, on uniprocessor and shared memory
machines, and as reliable as QRCP in pivot quality.
We show that RQRCP algorithms can be as reliable as QRCP with failure
probabilities exponentially decaying in oversampling size. We also analyze
efficiency differences among different RQRCP algorithms. More importantly, we
develop distributed memory implementations of RQRCP that are significantly
better than QRCP implementations in ScaLAPACK.
As a further development, we introduce the concept of and develop algorithms
for computing spectrum-revealing QR factorizations for low-rank matrix
approximations, and demonstrate their effectiveness against leading low-rank
approximation methods in both theoretical and numerical reliability and
efficiency.Comment: 11 pages, 14 figures, accepted by 2017 IEEE 24th International
Conference on High Performance Computing (HiPC), awarded the best paper priz
Algorithmic Based Fault Tolerance Applied to High Performance Computing
We present a new approach to fault tolerance for High Performance Computing
system. Our approach is based on a careful adaptation of the Algorithmic Based
Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel
distributed computation. We obtain a strongly scalable mechanism for fault
tolerance. We can also detect and correct errors (bit-flip) on the fly of a
computation. To assess the viability of our approach, we have developed a fault
tolerant matrix-matrix multiplication subroutine and we propose some models to
predict its running time. Our parallel fault-tolerant matrix-matrix
multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov)
and returns a correct result while one process failure has happened. This
represents 65% of the machine peak efficiency and less than 12% overhead with
respect to the fastest failure-free implementation. We predict (and have
observed) that, as we increase the processor count, the overhead of the fault
tolerance drops significantly
Computing the Conditioning of the Components of a Linear Least Squares Solution
In this paper, we address the accuracy of the results for the overdetermined
full rank linear least squares problem. We recall theoretical results obtained
in Arioli, Baboulin and Gratton, SIMAX 29(2):413--433, 2007, on conditioning of
the least squares solution and the components of the solution when the matrix
perturbations are measured in Frobenius or spectral norms. Then we define
computable estimates for these condition numbers and we interpret them in terms
of statistical quantities. In particular, we show that, in the classical linear
statistical model, the ratio of the variance of one component of the solution
by the variance of the right-hand side is exactly the condition number of this
solution component when perturbations on the right-hand side are considered. We
also provide fragment codes using LAPACK routines to compute the
variance-covariance matrix and the least squares conditioning and we give the
corresponding computational cost. Finally we present a small historical
numerical example that was used by Laplace in Theorie Analytique des
Probabilites, 1820, for computing the mass of Jupiter and experiments from the
space industry with real physical data
- …